123 research outputs found

    The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas

    Get PDF
    Ontologies of research areas are important tools for characterising, exploring, and analysing the research landscape. Some fields of research are comprehensively described by large-scale taxonomies, e.g., MeSH in Biology and PhySH in Physics. Conversely, current Computer Science taxonomies are coarse-grained and tend to evolve slowly. For instance, the ACM classification scheme contains only about 2K research topics and the last version dates back to 2012. In this paper, we introduce the Computer Science Ontology (CSO), a large-scale, automatically generated ontology of research areas, which includes about 26K topics and 226K semantic relationships. It was created by applying the Klink-2 algorithm on a very large dataset of 16M scientific articles. CSO presents two main advantages over the alternatives: i) it includes a very large number of topics that do not appear in other classifications, and ii) it can be updated automatically by running Klink-2 on recent corpora of publications. CSO powers several tools adopted by the editorial team at Springer Nature and has been used to enable a variety of solutions, such as classifying research publications, detecting research communities, and predicting research trends. To facilitate the uptake of CSO we have developed the CSO Portal, a web application that enables users to download, explore, and provide granular feedback on CSO at different levels. Users can use the portal to rate topics and relationships, suggest missing relationships, and visualise sections of the ontology. The portal will support the publication of and access to regular new releases of CSO, with the aim of providing a comprehensive resource to the various communities engaged with scholarly data

    PREDICT: a method for inferring novel drug indications with application to personalized medicine

    Get PDF
    The authors present a new method, PREDICT, for the large-scale prediction of drug indications, and demonstrate its use on both approved drugs and novel molecules. They also provide a proof-of-concept for its potential utility in predicting patient-specific medications

    Automated annotation of chemical names in the literature with tunable accuracy

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A significant portion of the biomedical and chemical literature refers to small molecules. The accurate identification and annotation of compound name that are relevant to the topic of the given literature can establish links between scientific publications and various chemical and life science databases. Manual annotation is the preferred method for these works because well-trained indexers can understand the paper topics as well as recognize key terms. However, considering the hundreds of thousands of new papers published annually, an automatic annotation system with high precision and relevance can be a useful complement to manual annotation.</p> <p>Results</p> <p>An automated chemical name annotation system, MeSH Automated Annotations (MAA), was developed to annotate small molecule names in scientific abstracts with tunable accuracy. This system aims to reproduce the MeSH term annotations on biomedical and chemical literature that would be created by indexers. When comparing automated free text matching to those indexed manually of 26 thousand MEDLINE abstracts, more than 40% of the annotations were false-positive (FP) cases. To reduce the FP rate, MAA incorporated several filters to remove "incorrect" annotations caused by nonspecific, partial, and low relevance chemical names. In part, relevance was measured by the position of the chemical name in the text. Tunable accuracy was obtained by adding or restricting the sections of the text scanned for chemical names. The best precision obtained was 96% with a 28% recall rate. The best performance of MAA, as measured with the F statistic was 66%, which favorably compares to other chemical name annotation systems.</p> <p>Conclusions</p> <p>Accurate chemical name annotation can help researchers not only identify important chemical names in abstracts, but also match unindexed and unstructured abstracts to chemical records. The current work is tested against MEDLINE, but the algorithm is not specific to this corpus and it is possible that the algorithm can be applied to papers from chemical physics, material, polymer and environmental science, as well as patents, biological assay descriptions and other textual data.</p

    CSI-OMIM - Clinical Synopsis Search in OMIM

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The OMIM database is a tool used daily by geneticists. Syndrome pages include a Clinical Synopsis section containing a list of known phenotypes comprising a clinical syndrome. The phenotypes are in free text and different phrases are often used to describe the same phenotype, the differences originating in spelling variations or typing errors, varying sentence structures and terminological variants.</p> <p>These variations hinder searching for syndromes or using the large amount of phenotypic information for research purposes. In addition, negation forms also create false positives when searching the textual description of phenotypes and induce noise in text mining applications.</p> <p>Description</p> <p>Our method allows efficient and complete search of OMIM phenotypes as well as improved data-mining of the OMIM phenome. Applying natural language processing, each phrase is tagged with additional semantic information using UMLS and MESH. Using a grammar based method, annotated phrases are clustered into groups denoting similar phenotypes. These groups of synonymous expressions enable precise search, as query terms can be matched with the many variations that appear in OMIM, while avoiding over-matching expressions that include the query term in a negative context. On the basis of these clusters, we computed pair-wise similarity among syndromes in OMIM. Using this new similarity measure, we identified 79,770 new connections between syndromes, an average of 16 new connections per syndrome. Our project is Web-based and available at <url>http://fohs.bgu.ac.il/s2g/csiomim</url></p> <p>Conclusions</p> <p>The resulting enhanced search functionality provides clinicians with an efficient tool for diagnosis. This search application is also used for finding similar syndromes for the candidate gene prioritization tool S2G.</p> <p>The enhanced OMIM database we produced can be further used for bioinformatics purposes such as linking phenotypes and genes based on syndrome similarities and the known genes in Morbidmap.</p

    ARRDC3 suppresses breast cancer progression by negatively regulating integrin β4

    Get PDF
    Large-scale genetic analyses of human tumor samples have been used to identify novel oncogenes, tumor suppressors and prognostic factors, but the functions and molecular interactions of many individual genes have not been determined. In this study we examined the cellular effects and molecular mechanism of the arrestin family member, ARRDC3, a gene preferentially lost in a subset of breast cancers. Oncomine data revealed that the expression of ARRDC3 decreases with tumor grade, metastases and recurrences. ARRDC3 overexpression represses cancer cell proliferation, migration, invasion, growth in soft agar and in vivo tumorigenicity, whereas downregulation of ARRCD3 has the opposite effects. Mechanistic studies showed that ARRDC3 functions in a novel regulatory pathway that controls the cell surface adhesion molecule, β-4 integrin (ITGβ4), a protein associated with aggressive tumor behavior. Our data indicates ARRDC3 directly binds to a phosphorylated form of ITGβ4 leading to its internalization, ubiquitination and ultimate degradation. The results identify the ARRCD3-ITGβ4 pathway as a new therapeutic target in breast cancer and show the importance of connecting genetic arrays with mechanistic studies in the search for new treatments

    Falls in young, middle-aged and older community dwelling adults: perceived cause, environmental factors and injury

    Get PDF
    BACKGROUND: Falls in older people have been characterized extensively in the literature, however little has been reported regarding falls in middle-aged and younger adults. The objective of this paper is to describe the perceived cause, environmental influences and resultant injuries of falls in 1497 young (20–45 years), middle-aged (46–65 years) and older (> 65 years) men and women from the Baltimore Longitudinal Study on Aging. METHODS: A descriptive study where participants completed a fall history questionnaire describing the circumstances surrounding falls in the previous two years. RESULTS: The reporting of falls increased with age from 18% in young, to 21% in middle-aged and 35% in older adults, with higher rates in women than men. Ambulation was cited as the cause of the fall most frequently in all gender and age groups. Our population reported a higher percentage of injuries (70.5%) than previous studies. The young group reported injuries most frequently to wrist/hand, knees and ankles; the middle-aged to their knees and the older group to their head and knees. Women reported a higher percentage of injuries in all age groups. CONCLUSION: This is the first study to compare falls in young, middle and older aged men and women. Significant differences were found between the three age groups with respect to number of falls, activities engaged in prior to falling, perceived causes of the fall and where they fell

    Benchmarking Ontologies: Bigger or Better?

    Get PDF
    A scientific ontology is a formal representation of knowledge within a domain, typically including central concepts, their properties, and relations. With the rise of computers and high-throughput data collection, ontologies have become essential to data mining and sharing across communities in the biomedical sciences. Powerful approaches exist for testing the internal consistency of an ontology, but not for assessing the fidelity of its domain representation. We introduce a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. We then test these metrics using (1) four of the most common medical ontologies with respect to a corpus of medical documents and (2) seven of the most popular English thesauri with respect to three corpora that sample language from medicine, news, and novels. Here we show that our approach captures the quality of ontological representation and guides efforts to narrow the breach between ontology and collective discourse within a domain. Our results also demonstrate key features of medical ontologies, English thesauri, and discourse from different domains. Medical ontologies have a small intersection, as do English thesauri. Moreover, dialects characteristic of distinct domains vary strikingly as many of the same words are used quite differently in medicine, news, and novels. As ontologies are intended to mirror the state of knowledge, our methods to tighten the fit between ontology and domain will increase their relevance for new areas of biomedical science and improve the accuracy and power of inferences computed across them

    Randomized trial of thymectomy in myasthenia gravis

    Get PDF
    corecore